LISP[F77,JMC]1 - www.SailDart.org

perm filename LISP[F77,JMC]1 blob sn#310797 filedate 1977-10-16 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00003 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.require "memo.pub[let,jmc]" source
C00033 00003	.skip 1
C00034 ENDMK
C⊗;
.require "memo.pub[let,jmc]" source;
.turn off "{"
.cb HISTORY OF LISP


1. Introduction.

	As a programming language, LISP is characterized by the following
ideas:  computing with symbolic expressions rather than numbers,
representation of symbolic expressions and other information by list
structure in the memory of a computer, functional composition
representation of information in external media by atoms and lists and
secondarily by S-expressions, a small set of selector and constructor
operations expressed as functions, composition of functions as a tool for
forming more complex functions, the use of conditional expressions for
getting branching into function definitions, the recursive use of
conditional expressions as a sufficient tool for building computable
functions, the use of λ-expressions for naming functions, the
representation of LISP programs as LISP data, the conditional expression
interpretation of Boolean connectives, the LISP function %2eval%1 that
serves both as a formal definition of the language and as an interpreter,
and garbage collection as a means of handling the erasure problem.

	Some of these ideas were taken from other languages but most were
new.  Towards the end of the initial period, it became clear that this
combination of ideas made an elegant mathematical system as well as a
practical programming language.  Then mathematical neatness became a goal
and led to pruning some features from the core of the language.  This was
partly motivated by esthetic reasons and partly by the belief that it
would be easier to devise techniques for proving programs correct if the
semantics were compact and without exceptions.

	In addition to its core consisting of recursive conditional
expression and Boolean expression definitions built up from the
three functions %2car%1, %2cdr%1 and %2cons%1 and the predicates
%2atom%1 and %2eq%1, LISP has many features adopted from other
programming languages of the late fifties and early sixties.  A
recurrent issue has always been how to handle free variables in
function definitions.  This problem has also plagued Algol and
other programming languages, and a compromise has always been
necessary between definitional power and high speed implementation
by interpreters and compilers.

	This paper will concentrate on the development of the basic
ideas and will distinguish two periods - summer 1956 through summer
1958 in which the main ideas were developed, and Fall 1958 through
1962 when the programming language was implemented and applied to
problems of artificial intelligence.  The scope of this conference
does not go beyond that period, and anyway after that period,
the development of LISP became multi-stranded with different ideas
being pursued in different places.


2. LISP prehistory - Summer 1956 through Summer 1958.

	The desire to design a list processing language for
artificial intelligence work on the
IBM 704 computer arose in the summer of 1956 at the time of the
Dartmouth Summer Research Project on Artificial Intelligence.
The suitability of list processing for AI research was
first pointed out by Newell, Shaw and Simon when they described
their Logic Theorist program and its implementation in an early
version of IPL on the JOHNNIAC computer at RAND Corporation.
There was little temptation to copy IPL, because its form was
based on a JOHNNIAC loader that happened to be available to them,
and because the FORTRAN idea of writing programs algebraically
seemed much more attractive.  It was immediately apparent that
subsubexpressions of symbolic expressions could
be obtained by composing the functions that extract subexpressions,
and this seemed reason enough to go to
an algebraic language.

	There were two motivations for developing a language for
the IBM 704.  First, IBM had undertaken to establish a New England
Computation Center at M.I.T., and Dartmouth would be able to use it.
Second, IBM was undertaking to develop a program for proving
theorems in plane geometry (based on an idea of Marvin Minsky's), and
I was to serve as a consultant to that project.  At the time, IBM
looked like a good bet to pursue artificial intelligence research
vigorously, and further projects were expected.

	The first problem was to decide how to do list structure
in the IBM 704.  This computer has a 36 bit word, and two 15 bit
parts, called the address and decrement were distinguished by
special instructions form moving their contents to and from the
15 bit index registers.  The address of the machine was 15 bits,
so it was clear that list structure should use 15 bit pointers.
Therefore, it was natural to consider the word as divided into
4 parts, the address part, the decrement part, the prefix part
and the tag part.  The last two were three bits each and separated
from each other by the decrement so that they could not be easily
combined into a single six bit part.

	At this point there was some indecision about what the
basic operators should be, because the operation of extracting
a part of the word by masking was considered separately from
the operation of taking the contents of an address as a function
of that address.  At the time, it seemed dubious to regard the
latter operation as a function, since its value depended on the
contents of memory at the time the operation was performed, so
it didn't act like a proper mathematical function.  However, the
advantages of treating it grammatically as a function so that
it could be composed were also apparent.

	Therefore, the initial set of functions included %2cwr%1,
standing for "Contents of the Word in Register number" and four
functions that extracted the parts of the word and shifted them
to a standard position at the right of the word.  An additional
function of three arguments that would also extract an arbitrary
bit sequence was thrown in just in case it might be useful.

	It was soon noticed that extraction of a subexpression
involved composing the extraction of the address part with %2cwr%1
and continuing along the list involved composing the extraction
of the decrement part with %2cwr%1.  Therefore, the compounds
%2car%1, standing for "Contents of the Address part of Register
number", and its analogs %2cdr%1, %2cpr%1, and %2ctr%1 were defined.
A construct operation for taking a word off the free storage list
and stuffing it with given contents was also obviously required.
At some point a %2cons(a,d,p,t)%1 was defined, but it was regarded
as a subroutine and not as a function with a value.  This work was
done at Dartmouth, but not on a computer, since the New England
Computation Center was not expected to receive its IBM 704 for
another year.

	In connection with the plane geometry project it was
decided to implement a list
processing language within FORTRAN, because this seemed to the
the easiest way to get started, and, in those days, writing a
compiler for a new language was believed to take many man-years.
This work was undertaken by Herbert Gelernter and Carl Gerberich
at IBM and led to FLPL, standing for FORTRAN List Processing
Language.  Gelernter and Gerberich noticed that
%2cons%1 should be a function, not just a subroutine, and that
its value should be the location of the word that had been
stuffed.  This permitted new expressions to be constructed out
of subsubexpressions by composing occurrences of %2cons%1.
While expressions cculd be handled easily in FLPL, and it was
used successfully for the Geometry program, it had neither conditional
expressions nor recursion, and erasing list structure was handled
explicitly by the programmer.

	Conditional expressions were invented in connection with
a set of legal move routines being written in FORTRAN for the
IBM 704 at M.I.T. during 1957-58.  This program did not use list-processing.
The IF statement provided in FORTRAN 1 and FORTRAN 2 was very awkward
to use, and it was natural to invent a function XIF(M,N,P) whose
value was N or P according to whether the expression M was zero or
not.  The function shortened many programs and made them easier to
understand, but it had to be used sparingly, because all three
arguments had to be evaluated before IF was entered, since IF was
called as an ordinary FORTRAN function though written in machine language.
This led to the invention of the true conditional expression and
in which only one of N and P is evaluated according to whether M
is true or false and
a desire for a programming language that would allow its use.

	I spent the summer of 1958 at IBM and was attracted to
the problem of differentiating algebraic expressions as an example
of symbolic computation.  Doing the single example of differentiation
led to the following innovations:

	a. Writing recursive function definitions using conditional
expressions.  The idea of differentiation is obviously recursive, and
conditional expressions allowed combining the cases into a single
formula.

	b. The %2maplist%1 function that forms a list of applications
of a functional argument to the elements of a list.  This was obviously
wanted for differentiating sums of arbitrarily many terms.  With a bit
more thought, it could be applied to differentiating products.

	c. If one is to use functions as arguments, then one needs
a notation for functions, and it seemed natural to use the λ-notation
of Church.  I didn't understand the rest of his book %2Calculi of
Lambda Conversion%1, so I wasn't tempted to try to implement his
more general mechanism for defining functions.  It used higher order
functionals instead of using conditional expressions.  Conditional
expressions are much more readily implemented on computers.

	d. The recursive definition of differentiation made no
provision for erasure of abandoned list structure.  No solution was
apparent at the time, but the idea of complicating the elegant
definition of differentiation by putting in explicit erasure
was unattractive.

	The differentiation program was not actually implemented that
summer, because FLPL allowed neither conditional expressions nor
recursive use of subroutines.


2. The implementation of LISP.

	In the Fall of 1958, I became Assistant Professor of Communication
Sciences (in the EE Department) at M.I.T., and Marvin Minsky and I
started the M.I.T. Artificial Intelligence Project.  The Project
was supported by the M.I.T. Research Laboratory of Electronics
which had a contract from the armed services that permitted great
freedom to the Director, Professor Jerome Wiesner, in initiating
new projects that seemed to him of scientific interest.  No written
proposal was ever made.  When Wiesner asked Minsky and me what we
needed for the project, we asked for a room, two programmers,
a secretary and a keypunch,
and we were asked whether we would also undertake the supervision of
some of the six mathematics graduate students that
R.L.E had undertaken to support.

	The implementation of LISP was undertaken in Fall 1958.  The
original idea was to produce a compiler, but this was considered a
major undertaking, so we started by hand-compiling various functions
into assembly language and writing subroutines to provide a LISP
"environment".  These included programs to read and print list structure.
I can't now remember whether the decision to use parenthesized list notation
was made then or whether it had already been used in discussing the
paper differentiation program.  The erasure problem also had to be
considered, and it was clearly unaesthetic to use explicit erasure
as did IPL.  There were two alternatives.

	The first alternative was to erase the old contents of a
program variable whenever it was updated.  Since the %2car%1 and %2cdr%1
operations were not to copy over structure, merging list structure
would occur, and so that erasure would have required a system of
reference counts.  Since there were only six bits left in a word,
and these were in separated parts of the word, reference counts seemed
infeasible without a drastic change in the way list structures were
represented.  The alternative was %2garbage collection%1 in which
storage is abandoned until the free storage list is exhausted, the
storage accessible from program variables and the stack is marked,
and the unmarked storage is made into a new free storage list.
Once we decided on garbage collection, its actual implementation
could be postponed, because only toy examples were being done.

	At that time it was also decided to use SAVE and UNSAVE
routines that use a stack in order save the values of variables
and subroutine return addresses in the implementation of recursive
subroutines.
Another decision was to give up the prefix and tag parts of the
word, to abandon %2cwr%1, and to make %2cons%1 a function of
two arguments.  This left us with only a single type - the
15 bit address - so that
functions didn't have to be restricted in the arguments they could
take.

	This simplification made LISP into a way of describing computable
functions much neater than the Turing machines or general recursive
definitions used in recursive function theory.  Actually the fact
that Turing machines constitute an awkward programming language
doesn't much bother recursive function theorists, because the
almost never have any reason to write particular recursive definitions,
since the theory concerns recursive functions in general.  They often
have reason to prove that recursive functions with specific properties
exist, but this can be done by an informal argument without having
to write them down explicitly.  Anyway, I decided to write a paper
describing LISP both as a programming language and as a formalism
for doing recursive function theory.  The paper was %2Recursive
functions of symbolic expressions and their computation by machine, part I%1
which appeared in the
%2Communications of the ACM%1 in April 1960.  Part II was never written
but was intended to contain applications to computing with algebraic
expressions.

	One way to show much neater LISP was than Turing machines was
to write a universal LISP function and show how much briefer and
more comprehensible it was than the description of a universal
Turing machine.  This was the LISP function %2eval[e,a]%1 that
computes the value of a LISP expression %2e%1 - the second argument
%2a%1 being a list of assignments of values to variables needed to
make the recursion work.  Writing %2eval%1 required inventing a notation
representing LISP functions as LISP data, and such a notation was
devised without much thought for the purposes of the paper.  S.R. Russell,
then one of the two programmers, noticed that %2eval%1 could serve
as an interpreter for LISP, promptly hand coded it, and we now had
a programming language.


3. Improvements.  LISP I and LISP 1.5.

	a. Property lists.  The idea of providing each atom with a list
of properties was present in the first assembly language implementation.
The READ and PRINT programs required that the print names of atoms be
accessible, and a soon as function definition became possible, it was
necessary to indicate whether a function was a SUBR in machine code
or was an EXPR represented by list structure.  Several functions dealing
with property lists were also made available for application
programs which made heavy use of them.

	b. Insertion of elements in lists and their deletion.
One of the original advertised virtues of list processing for AI
work was the ability to insert and delete elements of lists.
Unfortunately, this facility coexists uneasily with shared list
structure.  Moreover, operations that insert and delete don't have
a neat representation as functions.  LISP contains them in the form
of the %2rplaca%1 and %2rplacd%1 pseudo-functions, but programs that
use them cannot be conveniently represented in logic, because, regarded
as functions, they don't permit replacement of equals by equals.
Semantically, they must be represented by functions of state, and only
Michael Gordon really understands them.  All this could not be formulated
precisely while LISP was being developed, but it was clear that
%2rplaca%1 and %2rplacd%1 were anomalous.

	c. Numbers.  Many computations require both numbers and
symbolic expressions.  Numbers were implemented in LISP I as lists
of atoms, and this was useless for all but the simplest computations.
A reasonably efficient implementation of numbers as atoms in
S-expressions was made in LISP 1.5, but in all the early LISPs,
numerical computations were still 10 to 100 times slower than in
FORTRAN.  Efficient numerical computation requires some form of
typing in the source language and a distinction between numbers
treated by themselves and as elements of S-expressions.

	d. Free variables.  In all innocence, James R. Slagle
programmed the following LISP function definition and complained
when it didn't work right:

	%2testr[x,p,f,u] ← qif p[x] qthen f[x] qelse qif qa x qthen u[]
qelse testr[qd u,p,f,λ:testr[qd u,p,f,u]]%1.

The object of the function is to find a subexpression of ⊗x satisfying
%2p[x]%1 and return ⊗f[x].  If the search is unsuccessful, then the
continuation function ⊗u[] of no arguments is to be computed and its
value returned.  The difficulty was that when an inner recursion
occurred, the value of ⊗u wanted was the outer value, but the inner
value was actually used.  In modern terminology, lexical scoping was
wanted, and dynamic scoping was obtained.

	I must confess that I regarded this difficulty as just a bug
and expressed confidence that Steve Russell would soon fix it.  He did
fix it but by inventing the so-called FUNARG device that took the
lexical environment along with the functional argument.  Similar
difficulties later showed up in Algol 60, and Russell's turned out
to be one of the more comprehensive solutions to the problem.

	As a programming language LISP has many limitations.  Some of
the most evident in the early 1960s were weak numerical computation,
inability to represent objects by blocks of registers and garbage
collect the blocks, and lack of a good system for input-output of
symbolic expressions in conventional notations.  All these problems
and others were to be fixed in LISP 2.  Unfortunately, the LISP 2
project, undertaken by Information International and Systems
Development Corporation proved too ambitious for the computer that
had to be used and for the budget of the project.  Therefore, we had
to settle for LISP 1.5 developed at M.I.T. which corrected only the
most glaring deficiencies.


4. Conclusions.

	LISP is now the second oldest programming language (after FORTRAN)
still in widespread use.  It owes its longevity to the fact that its core
occupies some kind of local optimum in the space of programming
languages given that static friction discourages purely notational
changes.  Recursive use of conditional expressions, representation
of data as lists, and representation of program in the same language
as data will probably have a very long life.
.skip 1
.begin verbatim
John McCarthy
Artificial Intelligence Laboratory
Computer Science Department
Stanford University
Stanford, California 94305

ARPANET: MCCARTHY@SU-AI
.end

.turn on "{"
%7This draft of
LISP[F77,JMC]
PUBbed at {time} on {date}.%1